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The genetic contribution to the variation in human lifespan is ~25%. Despite the large number of identified 
disease-susceptibility loci, it is not known which loci influence population mortality. We performed a 
genome-wide association meta-analysis of 7729 long-lived individuals of European descent (>85 years) and 
16 121 younger controls (<65 years) followed by replication in an additional set of 13 060 long-lived individuals 
and 61 156 controls. In addition, we performed a subset analysis in cases aged > 90 years. We observed genome- 
wide significant association with longevity, as reflected by survival to ages beyond 90 years, at a novel locus, 
rs21 49954, on chromosome 5q33.3 (OR = 1.10, P = 1.74 x 10 -8 ). We also confirmed association of rs4420638 
on chromosome 19q13.32 (OR = 0.72, P = 3.40 x 10 -36 ), representing the TOMM40/A POE/APOC 1 locus. In a 
prospective meta-analysis (n = 34 103), the minor allele of rs2149954 (T) on chromosome 5q33.3 associates 
with increased survival (HR = 0.95, P = 0.003). This allele has previously been reported to associate with low 
blood pressure in middle age. Interestingly, the minor allele (T) associates with decreased cardiovascular 
mortality risk, independent of blood pressure. We report on the first GWAS-identified longevity locus on chromo- 
some 5q33.3 influencing survival in the general European population. The minor allele of this locus associates 
with low blood pressure in middle age, although the contribution of this allele to survival may be less dependent 
on blood pressure. Hence, the pleiotropic mechanisms by which this intragenic variation contributes to lifespan 
regulation have to be elucidated. 
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INTRODUCTION 

Worldwide, human life expectancy has increased remarkably 
over the last two centuries (1), although the healthy life expect- 
ancy lags behind. Citizens of the European Union, for example, 
spend only 75 - 80% of their lifespan in good health (2). Families 
in which longevity clusters form an exception in this sense, by 
showing beneficial or 'youthful' profiles for many metabolic 
and immune-related parameters (3-7) and a low prevalence of 
common diseases from middle age onwards (5,8,9). Therefore, 
the genome of long-lived individuals is investigated to identify 
variants that promote healthy aging and protect against 
age-related disease. This is a major challenge because the 
genetic component of lifespan variation in the population at 
large has been estimated to be only ~25% (10,11) and is 
assumed to be determined by many, still uncharacterized, 
genes (12,13). Genetic influences on human longevity are 
expected to reflect longevity assurance mechanisms acting 
across species (14), as well as more heterogeneous population- 
specific effects. Although numerous genome-wide association 
studies (GWAS) have successfully identified loci involved in 
common, age-related diseases (15), the corresponding suscepti- 
bility loci do not explain the genetic component of human lon- 
gevity (16). GWAS for human longevity have thus far failed to 
identify genome-wide significant loci, besides the well-known 
TOMM40/APOE/APOC1 locus (17-19). 

In this paper, we conducted a large genome-wide association 
meta-analysis of human longevity in 14 studies with long-lived 
cases (>85 years) and younger controls (<65 years) from 
European descent. In addition, we performed a subset analysis 
in cases aged > 90 years. The novel longevity locus we identified 
was tested for association with prospective (cause-specific) mor- 
tality in a meta-analysis of 1 1 European cohorts and examined 
for association with various metabolic traits that may explain 
the mechanism by which the locus contributes to survival to 
high ages. 



RESULTS 

Genome-wide association analysis 

In order to identify novel loci involved in lifespan regulation, we 
conducted a meta-analysis on GWAS data of 7729 long-lived 
cases (>85 years) and 16 121 younger controls (<65 years) 
from 14 studies originating from 7 European countries (Supple- 
mentary Material, Table SI). For each study, cases and controls 
originated from the same country. Given the higher heritability 
of longevity at older ages (1 1,20), we performed a subset ana- 
lysis in which we compared cases aged >90 years (« = 5406) 
with 15 112 controls (<65 years) from the corresponding 
control cohorts. Replication was performed in 1 3 060 cases aged 
>85 years (of which 7330 were >90 years) and 61 156 controls 
from 6 additional studies, of which 3 originated from European 
countries not represented in the discovery phase meta-analysis 
(Supplementary Material, Table SI). Analysis of each study 
was performed using a logistic regression-based method, and 
results were adjusted for study-specific genomic inflation 
factors (A) (Supplementary Material, Table S2). Meta-analysis 
was performed on 2 480 356 (>85 years) and 2 470 825 (>90 
years) imputed SNPs using a fixed-effect approach, and results 



were further adjusted for the overall genomic inflation factor 
(A = 1.019) (Supplementary Material, Fig. SI). A flow chart 
of the consecutive analysis steps is depicted in Figure 1 . 

The discovery phase meta-analyses of the cases aged >85 
years (n = 7729) showed genome-wide significant association 
with survival into old age at one locus, the previously identified 
TOMM40/APOE/APOC1 locus (17,21) (rs4420638 (G); odds 
ratio (OR) = 0.71, P= 6.14 x 10" 19 ; Table 1). No gender- 
dependent effects were observed in the sex-stratified analysis 
of the cases aged > 85 years (Supplementary Material, Table S4). 
The discovery-phase meta-analysis of the cases aged >90 
years (n = 5406) showed a similar result, i.e. the TOMM40/ 
APOE/APOC1 locus was the only genome-wide significant 
locus (OR = 0.64, P = 4.09 x 10" 21 ; Fig. 2 and Table 2). The 
regional association plot and forest plot for the TOMM40/ 
APOE/APOC1 locus are depicted in Figures 3 and 4, respective- 
ly. Although several SNPs on chromosome 19ql3.32, which are 
in moderate linkage disequilibrium (LD) with rs4420638, show 
additional association with survival into old age, meta-analysis 
conditional on rs4420638 showed no independent associations 
among these SNPs (Supplementary Material, Fig S2 and Table S3). 

Replication 

In addition to the TOMM40/APOE/APOC1 locus, we found 
eight loci that showed suggestive evidence for association in 
the discovery -phase meta-analysis of cases aged >85 years 
(P < 1 x 10~ ; Table 1), whereas six additional SNPs met this 
criterion in the meta-analysis of cases aged >90 years 
(Table 2). The most or (when not successfully measured) 
second most significant SNPs from these 14 loci and the 
TOMM40/APOE/APOC1 locus were taken forward for replica- 
tion in 13 060 cases aged >85 years (of which 7330 were also 
>90 years) and 61 156 controls from 6 additional studies. In 
the joint analysis of the discovery and replication phase of the 
cases aged >85 years (9 loci), the TOMM40/APOE/APOC1 
locus remained the only genome-wide significant locus 
(Table 1). The joint analysis of the discovery and replication 
phase of the cases aged >90 years (12 loci), however, showed 
an additional genome-wide significant locus, rs2 149954 (T), 
on chromosome 5q33.3 (OR=1.10, P = IJ4 x 10" 8 ; 
Table 2). Although the association of this SNP with survival 
up to 85 years is not genome-wide significant (OR = 1.07, 
P = 4.34 x 10 6 ; Table 1), the locus likely affects survival 
from middle age onwards. The regional association plot (based 
on the discovery phase only) and forest plot of this locus are 
depicted in Figures 3 and 4, respectively. Conditional analysis 
of rs4420638 in the discovery phase studies showed that the as- 
sociation of rs2 149954 (T) with survival is independent of the 
TOMM40IAPOEIAPOC1 locus (P = 7.20 x 10" 6 instead of 
P = 5.98 x 10 6 in the analysis of survival up to 85 years). 

Prospective analysis 

To determine the association of rs4420638 ( TOMM40/APOE/ 
APOC1 locus) and rs2149954 (chromosome 5q33.3 locus) 
with longitudinal survival, we performed a prospective 
meta-analysis of the 2 SNPs in 34 103 individuals aged 30- 
105 years from 11 different cohorts, of which 8582 had died 
after a mean follow-up time ranging from 2.2 to 17.4 years 
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Figure 1. Flow chart of experimental work. The analysis in the cases aged >90 years is a subset analysis of the analysis in the cases aged >85 years. Twelve out of 14 
studies used for the discovery phase analysis of cases aged > 85 years contained at least 100 cases over 90 years of age and were thus analyzed in the subset analysis of 
cases aged >90 years. 



(Supplementary Material, Table S5). Carriers of the minor allele 
of rs4420638 (G) showed significantly higher all-cause mortal- 
ity (hazard ratio (HR) = 1.07, P = 0.019), whereas carriers of 
the minor allele of rs2 149954 (T) demonstrated significantly 
lower all-cause mortality (HR = 0.95, P = 0.003; Supplemen- 
tary Material, Table S6). 

Association with cardiovascular disease and blood pressure 

To gain insight into the mechanism by which the chromosome 
5q33.3 locus might promote human longevity, we analyzed the 
cause-specific mortality of rs2149954. Carriers of the minor 
allele of rs2 149954 have a lower mortality risk for cardiovascu- 
lar disease (CVD) (HR = 0.86, P= 0.004), which mainly 
appeared to be caused by protection from stroke (HR = 0.60, 



P = 2.27 x 10 7 ). In addition, we observed an effect of this 
SNP on non-CVD mortality (HR = 0.86, P = 0.002) (Supple- 
mentary Material, Table S7). We also examined the Coronary 
ARtery Disease Genome- Wide Replication And Meta- Analysis 
(CARDIoGRAM) GWAS (23), which showed a significant 
association of rs2 149954 with a decreased risk for coronary 
artery disease (CAD) (OR = 0.96, P = 0.01 1) (Supplementary 
Material, Table S8). In addition, two SNPs on chromosome 
5q33.3 in high LD with rs2149954, rs9313772 (r 2 = 0.928) 
and rsl 1953630 (r 2 = 0.854) have previously been reported to 
associate with blood pressure and hypertension (24,25). As 
expected, examining rs2 1 49954 in the International Consortium 
for Blood Pressure GWAS (24) showed a significant association 
of the minor allele with lower diastolic (P = 3.46 x 10 5 ) and 
systolic (P=6.55 x 10~ 6 ) blood pressure (Supplementary 



Table 1. Results of the discovery phase, replication phase and joint analysis of cases aged > 85 years 



Locus Lead SNP Chromosome Position Candidate/closest gene EA Analysis n EAF OR 95% CI P I 2 (%) P, 

















Cases 


Controls 


Cases 


Controls 
















lq43 


rsl625040 


1 


235 213 002 


MTR, RYR2 


A 


Discovery 


7729 


16 121 


0.170 


0.150 


1.16 


1.09-1.23 


3.36 x 


10 


-6 


















Replication 


13 027 


60 914 


0.178 


0.182 


1.02 


0.98-1.07 


0.216 






















Joint 


20 756 


77 035 






1.07 


1.03-1.10 


3.50 x 


10 


-4 


31.0 


0.093 


2q24.3 


rs6432832 


2 


166079 072 


CSRNP3 


A 


Discovery 


7729 


16 121 


0.344 


0.321 


1.12 


1.07-1.17 


2.79 x 


10 


-6 


















Replication 


13019 


60 824 


0.346 


0.339 


1.03 


1.00-1.07 


0.029 






















Joint 


20 748 


76 945 






1.06 


1.03-1.09 


8.73 x 


10 


-6 


0.0 


0.467 


4q27 


rsl3114426 


4 


120 942 533 


PDE5A, MAD2L1 


T 


Discovery 


7729 


16 121 


0.387 


0.405 


0.90 


0.87-0.95 


2.20 x 


10 


-5 


















Replication 


13 024 


60 932 


0.364 


0.351 


1.00 


0.97-1.04 


0.711 






















Joint 


20 753 


77 053 






0.97 


0.94-0.99 


0.033 






46.5 


0.012 


5q33.3 


rs2149954 


5 


157 753 180 


EBF1 


T 


Discovery 


7729 


16 121 


0.388 


0.360 


1.12 


1.07-1.17 


5.98 x 


10 


-6 


















Replication 


12 973 


60 262 


0.365 


0.352 


1.04 


1.01-1.07 


0.013 






















Joint 


20 702 


76 383 






1.07 


1.04-1.09 


4.34 x 


10" 


-6 


28.2 


0.118 


8ql3.3 


rsl0957550 a 


8 


72 457 142 


EYA1 


A 


Discovery 


7727 


16 093 


0.268 


0.285 


0.88 


0.84-0.93 


3.61 x 


10 


-6 


















Replication 


10056 


56 262 


0.236 


0.244 


0.95 


0.92-0.99 


0.012 






















Joint 


17 783 


72 355 






0.92 


0.90-0.95 


1.41 x 


10 


-6 


29.4 


0.130 


10q23.33 


rs4466755 


10 


96 622 243 


CYP2C19, CYP2C9 


T 


Discovery 


7729 


16 121 


0.454 


0.443 


1.12 


1.07-1.16 


2.72 x 


10" 


-6 


















Replication 


13051 


61 105 


0.488 


0.508 


0.98 


0.95-1.01 


0.129 






















Joint 


20 780 


77 226 






1.03 


1.00-1.05 


0.161 






65.6 


2.15 x 


17q23.3 


rs 177603 62 


17 


58 772 399 


TANC2 


A 


Discovery 


7729 


16 121 


0.252 


0.233 


1.13 


1.07-1.19 


5.38 x 


10 


-6 


















Replication 


13 007 


60 679 


0.252 


0.249 


1.04 


1.00-1.07 


0.033 






















Joint 


20 736 


76 800 






1.07 


1.04-1.10 


1.56 x 


10 


-5 


0.0 


0.473 


19ql3.32 


rs4420638 a 


19 


50 114 786 


APOE 


G 


Discovery 


7728 


16 111 


0.157 


0.195 


0.71 


0.67-0.77 


6.14 x 


10 


-19 


















Replication 


10 165 


57 126 


0.180 


0.202 


0.87 


0.83-0.91 


2.12 x 


10 


-12 


















Joint 


17 893 


73 237 






0.82 


0.79-0.85 


2.33 x 


10 


-26 


80.2 


4.35 x 


20ql3.2 


rs8126377 


20 


51 590254 


TSHZ2, ZNF217 


G 


Discovery 


7532 


15 902 


0.059 


0.069 


0.79 


0.71-0.87 


1.35 x 


10 


-5 


















Replication 


12 974 


60 647 


0.058 


0.054 


1.01 


0.94-1.08 


0.901 






















Joint 


20 506 


76 549 






0.93 


0.88-0.99 


0.020 






51.1 


0.006 



EA, effect allele; EAF, effect allele frequency after pooling the data of all analyzed individuals; OR, odds ratio for the effect allele; 95% CI, 95% confidence interval; I 2 , heterogeneity statistic; Phet, P-value for 
heterogeneity. 

"Genotyping of these SNPs with the Sequenom MassARRAY system for the replication phase was unsuccessful. The SNPs in bold overlap with Table 2. 



Human Molecular Genetics, 2014, Vol. 23, No. 16 4425 



A 25 



20 



. 15 



10 



apoe 




I 2 3 4 5 6 7 g S ID |1 12 1$ 1.4 15 1S 17 1ft 19 20217? 

Chromosome 



B 25 



20 



. 15- 



10 




Figure 2. Results of the discovery phase analysis. Manhattan plot presenting the 
— logio P-values from the discovery phase analysis of cases aged > 85 years (A) 
and >90 years (B). The loci that showed a genome-wide significant association 
after the joint analysis of the discovery and replication phase (chromosome 
19ql3.32 and 5q33.3) are shown in red. 

Material, Table S9). Despite the highly interesting association of 
the minor allele of rs2 149954 with low blood pressure and a 
decreased risk for CAD, stroke and mortality, its association 
with decreased all-cause mortality was not influenced by 
blood pressure in two studies of participants aged >75 years 
(PROSPER and Leiden 85-plus study Cohort II; Supplementary 
Material, Table S10). This may indicate that at higher ages, this 
locus influences longevity via pathways additional to those 
involved in blood pressure regulation. 



Phenotypic characterization and pathway analysis 

In an attempt to identify the underlying mechanism by which this 
novel longevity locus at chromosome 5q33.3 could influence 
human longevity, we examined rs2 149954 in the published 
data of several large G WAS consortia for association with meta- 
bolic traits in generally middle-aged individuals. None of the 
investigated traits, i.e. 2 h glucose (OGTT), HbiAc, fasting 
glucose, fasting insulin, insulin resistance (HOMA-IR), p-cell 
activity (HOMA-B), total/HDL/LDL cholesterol, triglycerides 
and type 2 diabetes (26-32), demonstrated evidence for associ- 
ation (all P > 0.05) with rs2 149954 (Supplementary Material, 
Tables S8 and S9). 



Gene set enrichment analysis (GSEA) of the meta-analysis 
results of the discovery-phase analysis of survival aged >90 
years using Meta-Analysis Gene-set Enrichment of variaNT 
Associations (MAGENTA) (33), as well as examination of inter- 
connectivity of implicated genes using Gene Relationships 
Across Implicated Loci (GRAIL) (34) (Supplementary Material, 
Fig. S3 and Table SI 1), provided no firm clues for potential path- 
ways involved in human longevity. 



Fine mapping and functional characterization 

The newly identified longevity locus on chromosome 5q33.3 is 
located in an intergenic region on chromosome 5q33.3, 302 kb 
downstream of the EBF1 gene. To determine the functional 
impact of this locus, we first identified the SNPs in LD with 
rs2149954 (r 2 > 0.8) using the 1000 Genomes CEU Phase 1 
data implemented in HaploReg v2 (http://www.broadinstitute. 
org/mammals/haploreg/haploreg.php) (35). In total, we identi- 
fied 25 SNPs, spanning a region of ~22.3 kb (Supplementary 
Material, Table S12). Subsequently, we examined the potential 
effects of these SNPs on gene expression using several eQTL 
databases. None of the SNPs showed an association with gene 
expression in the various examined tissues, so it is still unclear 
in which tissue(s) the locus exert its longevity-promoting 
effect. We did, however, find some promising functional impli- 
cation of this locus, i.e. the presence of multiple DNase I hyper- 
sensitivity sites, transcription factor binding sites and enhancer 
histone marks, by exploring ENCODE data using HaploReg 
v2 (35) and RegulomeDB (http://www.regulomedb.org/) (36) 
(Supplementary Material, Table S12). Very recently, a large 
intergenic non-coding RNA (lincRNA), RP11-524N5.1, has 
been annotated right on top of our locus. The poly(A) features 
of this lincRNA are supported by PolyA-seq reads from liver, 
muscle and testis. PhastCons 44-way alignment supports conser- 
vation of the transcription start site (TSS), 3' UTR and the third, 
fifth and last exon of the lincRNA transcript (Fig. 5). The tran- 
script does not align to the mouse genome, but orthologous tran- 
scripts are found in other primate genome sequences, suggesting 
that this is a primate-specific lincRNA. 



DISCUSSION 

We have performed the largest genome-wide association 
meta-analysis for human longevity, in which a novel locus on 
chromosome 5q33.3 associating with survival beyond 90 years 
was identified. 

The minor allele of rs2 149954 (T) promotes human longevity 
by reducing the risk of mortality owing to stroke and non- 
cardiovascular causes. In addition, this allele has previously 
been associated with low blood pressure, which may explain the 
protection from CVD mortality risk in middle age. At ages 
above 80 years, however, low SBP associates with increased mor- 
tality (37,38). Hence, the observed blood pressure-independent 
association of the minor allele with mortality > 75 years may be 
due to pleiotropic effects on other mortality-related clinical para- 
meters. Examination of publically available data of several large 
GWAS consortia for association of the locus with parameters 
related to glucose and fat metabolism provided as yet no clues 
for other potentially involved mechanisms. 



Table 2. Results of the discovery phase, replication phase and joint analysis of cases aged >90 years 
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EA, effect allele; EAF, effect allele frequency after pooling the data of all analyzed individuals; OR, odds ratio for the effect allele; 95% CI, 95% confidence interval; I 2 , heterogeneity statistic; P het , P-value for 
heterogeneity. 

a Genotyping of this SNP with the Sequenom MassARRAY system for the replication phase was unsuccessful. The SNPs in bold overlap with Table 1 . 
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Figure 3. Regional association plots for the chromosome 19ql3.32 and 5q33.3 loci. Results of the discovery-phase analysis of chromosome 19ql3.32 (A) and5q33.3 
(B) in cases aged >90 years, generated using LocusZoom (http://csg.sph.umich.edu/locuszoom/) (22). For the two SNPs taken forward to the replication phase 
(rs4420638 and rs2149954), the results of the joint analysis are plotted. The color of the SNPs is based on the LD with the lead SNP (shown in purple). The blue 
peaks represent the recombination rates based on HapMap Phase I + II CEU release 22 (hg 1 8/build36), and the RefSeq genes in the region are shown in the lower panel. 
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Figure 4. Forest plots for rs4420638andrs2 149954. Forest plots representing the odds ratios with 95% CI of rs4420638 (A) andrs2 149954(B) for the cohorts analyzed 
in the discovery and replication phase (> 90 years). The size of the boxes represents the sample size of the cohort. 



Rs2 149954 is located in an intergenic region on chromosome 
5q33.3 between CLINT1 and EBF1. The presence of several 
regulatory elements in this region implies that transcription 
factor binding and/or expression of (nearby) genes could be 
influenced. The currently available eQTL databases did not 
provide evidence for such effects, which might be due to the 
limited tissue diversity of the databases. The effects of the 
chromosome 5q33.3 locus on human longevity might be 
exerted through the lincRNA, which has recently been annotated 



right on top of our locus (RP 1 1 -524N5 . 1 ) and shows evidence for 
expression in liver, muscle and testis. LincRNAs are involved in 
chromatin modification and transcriptional regulation (39) and 
seem to play a role in human disease (40). However, the newly 
annotated lincRNA is not yet available in the large eQTL data- 
bases, and the effect of SNPs in the chromosome 5q33.3 locus 
on expression of this transcript still needs to be determined. 
Hence, further functional studies are required to illuminate the 
mechanism by which this locus influences human longevity. 
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Figure 5. Chromosomal region around rs2 149954. The region contains a lincRNA(RPl 1-524N5.1) for which the poly(A) features are supported by PolyA-seq reads 
from liver, muscle and testis. RP1 1-524N5.1 is transcribed from the negative strand, and the rmastCons 44-way alignment supports conservation of the TSS, 3' UTR 
and the third, fifth and last exon of the transcript. Rs2 149954 and the 25 SNPs in high LD (r > 0.8, according to HaploReg v2 (35)) are located in the first intron of 
RP11-524N5.1. 



GWAS has thus far not been a successful approach to identify 
genome-wide significant hits for human longevity or mortality 
besides the well-known TOMM40/APOE/APOC1 locus (17-19). 
The FOX03A locus, for which the longevity effect is most promin- 
ent in individuals aged > 100 years (41), showed only moderate 
evidence for association with survival > 90 years in the discovery 
phase of our GWAS (lowest P = 1.35 x 10" 4 (rsl268161)). 
Sebastiani and colleagues suggested that human longevity might 
be explained by a signature consisting of 281 SNPs (42). 
However, none of the SNPs (except the already known SNP 
rs2075650 in TOMM40) was significant after adjustment for mul- 
tiple testing (P < 1.78 x 10" 4 (0.05/281)). Inaddition, we didnot 
observe an enrichment of significant SNPs from their signature 
in our data (A = 1.004, Supplementary Material, Fig. S4). 
Because the association of SNPs other than the TOMM40/ 
APOE/APOC1 locus could not be replicated in this, much 
larger, GWAS, we have doubts that these signature SNPs are 
indeed candidate SNPs influencing human longevity. Although 
we detected merely one novel genome-wide significant locus, 
the current GWAS had sufficient power, based on our results, 
to detect lifespan-regulating loci with relatively small effects 
(OR < 0.9 and > 1.1). 

The genetic component of human longevity is small (~25%) 
(10,11) and is assumed to be determined by many genes (12,13). 
Furthermore, the genetic heterogeneity in ageing and lifespan 
regulation is expected to be high, because individual genes 
may contribute by a diversity of late acting deleterious stochastic 
(germline) variation resulting in a genetic component that is hard 
to disentangle (13). GWAS of complex late-onset diseases, such 
as osteoarthritis and Alzheimer's disease, with sample sizes 
comparable to our current study (43-45), have identified more 
loci compared with GWAS of longevity . This most likely reflects 
the greater inherent complexity of the longevity trait, with its 
diverse spectrum of biological pathways subject to intrinsic 
and extrinsic (environmental) interactions. Hence, even larger 
GWAS (>50 000 long-lived individuals) may be required to 
identify additional longevity loci, preferably in the most strin- 
gent phenotype, i.e. the oldest old. 

As survival to ages > 85 or 90 years is relatively common in 
Western populations, the human longevity trait suffers from etio- 
logical heterogeneity. Lifespan extension in the past generations 
owing to non-genetic factors likely created phenocopies diluting 
the genetic component of survival to ages >85 years. The 
genetic contribution to survival to ages > 100 years is higher but 
will render smaller sample sizes for GWAS. This may explain 



why the novel locus on chromosome 5q33.3 was only genome- 
wide significant in the subset analysis of cases aged > 90 years. 
For the same reason, a large number of individuals from the 
control groups (up to 50%, depending on the gender and year of 
birth of the individuals and demography of the cohort) will live 
to ages >85 years. In 201 1, the mean life expectancy at age 65 
in Europe was 21.3 years for women and 17.8 years for men 
(http : // epp .eurostat .ec.europa.eu/ portal/page/portal/ product_deta 
ils/dataset?P_product_code=TSDDE210), which makes selec- 
tion of proper controls a challenging issue. The most ideal controls 
would be individuals from the same birth cohort as the long-lived 
cases that survived to the mean age of death of that birth cohort. 
However, for most of these individuals there is no DNA available. 
Alternatively, we selected controls that have not yet reached the 
age of 65 years at inclusion to represent the frequency of variants 
in the general population and minimize selection owing to mortal- 
ity. Hence, the low contrast between cases and controls likely has 
reduced our probability of identifying longevity loci. 

In addition, there will be differences between case and control 
cohorts that may have had an impact on our results. An example of 
a potential confounder is smoking behavior, which was not ad- 
equately measured in most elderly cohorts. However, none of 
the SNPs that were previously associated with smoking behavior 
in cohorts from European descent (according to the NHGRI 
GWAS Catalog (http://www.genome.gov/gwastudies/)), namely 
rsl051730, rsl329650 and rs4105144, show differences 
between cases (>85 years) and controls in the joint analysis of 
the discovery and replication phase (all P > 0.05). We have to 
note that these SNPs only explain a small proportion of the vari- 
ance observed in smoking behavior. However, as the frequency 
of these proxy SNPs for smoking behavior is similar between 
cases and controls, we expect no obvious differences in 
smoking behavior between the groups. 

In conclusion, besides the previously implicated TOMM40/ 
APOE/APOC1 locus, we identified a novel locus on chromosome 
5q33.3 that associates with survival beyond 90 years. Although 
rs2 149954 is associated with survival beyond 90 years at a 
genome- wide significant level in our study, replication in addition- 
al cohorts from European as well as non-European descent is war- 
ranted. The minor allele of the lead SNP at this locus, rs2 149954, 
promotes human longevity in a prospective meta-analysis by low- 
ering the risk of mortality owing to stroke and non-cardiovascular 
causes. The locus harbors a lincRNA and is implicated in blood 
pressure regulation, but the mechanism by which it influences lon- 
gevity likely also involves other traits. 
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MATERIALS AND METHODS 

Study populations 

The discovery analysis was performed in 7729 cases that survived 
to ages > 85 years (of which 5406 also survived to ages > 90 years) 
and 16 121 controls below 65 years at baseline, from 14 studies. 
Replication was performed in 1 3 060 cases that survived to ages 
>85 years (of which 7330 also survived to ages >90 years) and 
61 156 controls below 65 years at baseline, from 6 additional 
studies. All individuals were of European descent. The details 
of the discovery and replication studies can be found in Supple- 
mentary Material, Tables S 1 and S2. Some cohorts only provided 
controls (GOYA, NTR, SU.VI.MAX, TwinsUK and WTCCC2) 
or only cases (BELFAST, CEPH centenarian cohort, Danish lon- 
gevity study I/II, Leiden 85-plus Study I/II and Newcastle 85 + 
Study), whereas others contained both (Calabria cohort, 
deCODE, EGCUT, GEHA Study, German longevity study, 
Leiden Longevity Study, Rotterdam Study I/II and TwinGene). 
The names of the studies in the tables and figures are based on 
the names of the cohorts containing the cases. The cases and con- 
trols used for each study originated from the same country (Sup- 
plementary Material, Table SI). The only exception is 
BELFAST (Northern Ireland), for which we used controls from 
the NTR (Netherlands). A check in the PROSPER study, which 
includes individuals from Northern Ireland and the Netherlands, 
showed that the allele frequencies in control individuals from 
both countries are similar for our SNPs (data not shown). All par- 
ticipants provided written informed consent, and the study was 
approved by the relevant institutional review boards. 

Genotyping, imputation and genome-wide 
association analysis 

All discovery studies were genotyped using Illumina genotyping 
arrays, and pre-imputation quality control was performed for 
each study separately. Imputation was performed using 
IMPUTE or MACH with reference HapMap Phase I+II CEU 
release 22 (hg 1 8/build3 6). Further details about the genotyping, 
quality control and imputation of each study are summarized in 
Supplementary Material, Table S2. 

Two replication studies (deCODE and the Danish longevity 
study II) were also genotyped using Illumina genotyping 
arrays and imputed using IMPUTE with reference HapMap 
Phase I+II CEU release 22 (hgl8/build36) (Danish longevity 
study II) or deCODE software (deCODE). The other replication 
studies were genotyped with the Sequenom MassARRAY 
system using iPLEX Gold genotyping assays (Sequenom, San 
Diego, CA, USA). More information about the studies used in 
the replication phase can be found in Supplementary Material, 
Tables S 1 and S2. Of the 1 5 SNPs measured with the Sequenom 
MassARRAY system, 1 3 were successfully genotyped in at least 
95% of the samples and the average genotyping call rate was 
99.80%. We also checked the concordance between the SNPs 
measured with the Sequenom MassARRAY system and 
(imputed) GWAS data of the Leiden 85-plus study I cases, and 
the average concordance rate was 99.07%. The two SNPs that 
were not successfully genotyped with the Sequenom MassAR- 
RAY system (rsl0957550 and rs4420368) were only analyzed 
in the replication studies, which had imputed GWAS data avail- 
able (deCODE and the Danish longevity study II). 



All studies were analyzed separately using CC-assoc (https:// 
www.msbi.nl/dnn/Research/Genetics/Software/TestsforGWAS 
inrelatedindividuals(cc_assoc).aspx), which is based on a modi- 
fied version of the score test that takes into account imputation 
uncertainty and familial relatedness (46). SNPs with a low 
imputation quality (R 2 T < 40) and a MAF of <1 or <5% (if 
"cases < 200) were excluded from analysis in the discovery 
phase. Adjustment for population stratification of the discovery 
studies was performed by multiplying the ./^-adjusted variances 
of the score statistic with the genomic inflation factor (A range = 
0.97 - 1.08, Supplementary Material, Table S2) of the study. 

Meta-analyses 

For the meta-analyses, a fixed-effect approach was used. Scores 
and variances of the studies were combined to obtain a single 
meta-statistic, which was adjusted using the genomic inflation 
factor (A = 1.019, discovery phase only) (Supplementary Ma- 
terial, Fig. SI). For each analysis, we only used studies with at 
least 100 cases (Supplementary Material, Table SI). P-values 
<5 x 10~ 8 were considered genome-wide significant (47). To 
determine heterogeneity across the studies, the between-study 
variance was calculated. 

Conditional analysis 

To ascertain independent signals at the chromosome 19ql3.32 
locus, we performed a meta-analysis conditional on rs4420638 
in all studies used for the discovery phase analysis in cases 
aged >85 years. The results are depicted in Supplementary 
Material, Figure S2 and Table S3. 

Sex-stratified analysis 

Sex-stratified analysis of the cases aged > 85 years (« wom en = 5400 
and n men = 1865) was performed to investigate the presence of 
gender-dependent associations. In addition, the 15 loci that 
showed (suggestive) evidence for association with survival > 85 
and/or > 90 years were tested for differences between sexes using 

the formula: (/3 women - /3 men )/y(SE2 omen + SEf nen ). The results 
of this analysis are depicted in Supplementary Material, Table S4. 

Prospective analysis 

Prospective analysis of rs2 149954 and rs4420638 was per- 
formed using a Cox proportional hazards model adjusted for 
age at baseline, sex and study-specific covariates. The details 
about each of the analyzed cohorts are summarized in Supple- 
mentary Material, Table S5. 

Pathway analysis 

For the pathway analysis, we used GSEA implemented 
in MAGENTA (http://www.broadinstitute.org/mpg/magenta/) 
(33). In short, each SNP is mapped to a gene considering a 
window of 1 10 kb upstream and 40 kb downstream around the 
genes. Subsequently, each gene is assigned a gene association 
score based on the SNP with the lowest P-value, which is 
mapped to that gene and this score is adjusted for confounding 
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factors like gene size and the amount of SNPs per kb. Genes within 
the HLA region were removed from analysis owing to high LD and 
high gene density in that region. The GSEA algorithm tests for 
over-representation of adjusted gene scores in a given pathway 
using a pre-defined score rank cutoff (in our case, the 95th and 
75th percentile). The generated statistic is then compared with 
10 000-1 000 000 gene sets of identical size randomly sampled 
from the genome to generate an empirical P- value for each path- 
way. In total, 3216 pathways from Gene Ontology, PANTHER, 
Ingenuity, KEGG, REACTOME and BIOCARTA were tested. 
Pathways were considered significant if the FDR-adjusted 
P-value (the 95th or 75th percentile) was <0.05. 

To determine the relationship between loci associated with sur- 
vival >90 years, we used GRAIL (http://www.broadinstitute. 
org/mpg/grail/) (34). In short, this program maps SNPs to genes 
and subsequently uses a text-mining algorithm on PubMed 
abstracts to determine connections between these genes. Genes 
from independent loci, which share informative words, receive 
a high GRAIL similarity score and are more likely to be function- 
ally related. As we only had a limited number of loci with at least 
one SNP with aP- value <1 x 10" 5 («= 12, Table 2), we decided 
to perform GRAIL analysis on all loci with at least one SNP with a 
P-value <1 x 10" 4 (« = 65). 



eQTL analysis 

To determine whether rs2 149954 or SNPs in LD(r 2 > 0.8basedon 
1000 Genomes CEU Phase 1 data) influenced gene expression, we 
searched several eQTL databases, namely (1) the Gutenberg Heart 
Study database (GHS_Express) (48), which is based on expression 
data of monocytes; (2) the Genotype-Tissue Expression (GTEx) 
eQTL database (http://www.ncbi.nlm.nih.gov/gtex/GTEX2/gtex. 
cgi), which is based on expression data of brain (cerebellum, 
frontal cortex, temporal cortex and pons), liver and lymphoblastoid 
cell lines; (3) the GENe Expression VARiation (Genevar) database 
(http://www.sanger.ac.uk/resources/software/genevar/), which is 
based on expression data of adipose tissue, fibroblasts, T cells, 
skin and lymphoblastoid cell lines (49) and (4) the Blood eQTL 
browser (http://genenetwork.nl/bloodeqtlbrowser/) (50). 



SUPPLEMENTARY MATERIAL 

Supplementary Material is available at HMG online. 
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